81 research outputs found

    Catálogo de herramientas informáticas relacionadas con la creación, gestión y explotación de corpus textuales

    Get PDF
    Els corpus són recursos lingüístics importants que permeten obtenir una gran quantitat d'informació sobre l'ús real de la llengua. Aquest treball mostra els recursos més importants en aquesta àrea, ja sigui en forma de corpus ja compilats o de recursos informàtics que en faciliten la compilació, el processament i l'explotació.Los corpus son recursos lingüísticos importantes que permiten obtener gran cantidad de información sobre el uso real de la lengua. Este trabajo muestra los recursos más importantes en esta área bien sea en forma de corpus ya compilados como de recursos informáticos que facilitan su compilación, procesamiento y explotación.Corpora are important linguistic resources from which it is possible to obtain a great deal of information on real language use. This piece of work looks at the main resources in the field in question, encompassing both precompiled corpora and IT resources for their compilation, processing and exploitation

    Spanish named entity recognition in the biomedical domain

    Get PDF
    Named Entity Recognition in the clinical domain and in languages different from English has the difficulty of the absence of complete dictionaries, the informality of texts, the polysemy of terms, the lack of accordance in the boundaries of an entity, the scarcity of corpora and of other resources available. We present a Named Entity Recognition method for poorly resourced languages. The method was tested with Spanish radiology reports and compared with a conditional random fields system.Peer ReviewedPostprint (author's final draft

    A descriptive study about Wordnet (MCR) and linguistics synsets

    Get PDF
    Este artigo apresenta o trabalho realizado para aplicar a WordNet MCR ao domínio linguístico e discute as situações problemáticas geradas pela estrutura WordNet e pelas características inerentes ao domínio. Foi empregado o enfoque descritivo para explicar como a manutenção da estrutura original da WordNet pode afetar as extensões de um domínio específico. Nossos resultados mostram que, para poder ampliar os synsets de domínios específicos, é inevitável uma reorganização estrutural

    Arabic medical entity tagging using distant learning in a Multilingual Framework

    Get PDF
    AbstractA semantic tagger aiming to detect relevant entities in Arabic medical documents and tagging them with their appropriate semantic class is presented. The system takes profit of a Multilingual Framework covering four languages (Arabic, English, French, and Spanish), in a way that resources available for each language can be used to improve the results of the others, this is specially important for less resourced languages as Arabic. The approach has been evaluated against Wikipedia pages of the four languages belonging to the medical domain. The core of the system is the definition of a base tagset consisting of the three most represented classes in SNOMED-CT taxonomy and the learning of a binary classifier for each semantic category in the tagset and each language, using a distant learning approach over three widely used knowledge resources, namely Wikipedia, Dbpedia, and SNOMED-CT

    TASS2018: Medical knowledge discovery by combining terminology extraction techniques with machine learning classification

    Get PDF
    En este artículo presentamos la aproximación seguida por el equipo UPF-UPC en la tarea TASS 2018 Task 3 challenge. Nuestra aproximación puede calificarse, de acuerdo a los códigos propuestos por la organización, como H-KBS, ya que utiliza métodos basados en conocimiento y aprendizaje supervisado. El pipeline utilizado incluye: i) Un pre-proceso standard de los documentos usando Freeling (etiquetado morfosintáctico y análisis de dependencias); ii) El uso de una herramienta de etiquetado sequencial basada en CRF para completar las subtareas A (identificación de frases) y B (clasificación de frases), y iii) El abordaje de la subtarea C (extracción de relaciones semánticas) usando una aproximación híbrida que integra dos classificadores basados en Regresión Logística, y dos extractores léxicos para pares entity/entity y relaciones is-a y same-as.In this paper we present the procedure followed to complete the run submitted by the UPF-UPC team to the TASS 2018 Task 3 challenge. Such procedure may be classified, according the organization’s codes, as H-KB-S as it takes profit from a knowledge based methodology as well as some supervised methods. Our pipeline includes: i) A standard pre-process of the documents using Freeling tool suite (POS tagging and dependency parsing); ii) Use of a CRF sequence labelling tool for completing both subtasks A (key phrase identification) and B (key phrase classification), and iii) Facing the subtask C (setting semantic relationships) by using a hybrid approach that uses two Logistic Regression classifiers, followed by lexical shallow relation extractors for entity/entity pairs related by is-a and same-as relations.Peer ReviewedPostprint (published version

    Semantic tagging and normalization of French medical entities

    Get PDF
    In this paper we present two tools for facing task 2 in CLEF eHealth 2016. The first one is a semantic tagger aiming to detect relevant entities in French medical documents, tagging them with their appropriate semantic class and normalizing them with the Semantic Groups codes defined in the UMLS. It is based on a distant learning approach that uses several SVM classifiers that are combined to give a single result. The second tool is based on a symbolic procedure to obtain the English translation of each medical term and looks for normalization information in public accessible resources.Peer ReviewedPostprint (published version

    Semantic tagging of French medical entities using distant learning

    Get PDF
    In this paper we present a semantic tagger aiming to detect relevant entities in French medical documents and tagging them with their appropriate semantic class. These experiments has been carried out in the framework of CLEF2015 eHealth contest that proposes a tagset of ten classes from UMLS taxonomy. The system presented uses a set of binary classifiers, and a combination mechanisms for combining the results of the classifiers. Learning the classifiers is performed using two widely used knowledge source, one domain restricted and the other is a domain independent resource.Peer ReviewedPostprint (published version

    Syntactic methods for negation detection in radiology reports in Spanish

    Get PDF
    Identification of the certainty of events is an important text mining problem. In particular, biomedical texts report medical conditions or findings that might be factual, hedged or negated. Identification of negation and its scope over a term of interest determines whether a finding is reported and is a challenging task. Not much work has been performed for Spanish in this domain. In this work we introduce different algorithms developed to determine if a term of interest is under the scope of negation in radiology reports written in Spanish. The methods include syntactic techniques based in rules derived from PoS tagging patterns, constituent tree patterns and dependency tree patterns, and an adaption of NegEx, a well known rule-based negation detection algorithm (Chapman et al., 2001a). All methods outperform a simple dictionary lookup algorithm developed as baseline. NegEx and the PoS tagging pattern method obtain the best results with 0.92 F1.Peer ReviewedPostprint (published version

    Utilización de Wikipedia para la extracción de términos en el dominio biomédico: primeras experiencias

    Get PDF
    Presentamos un sistema de extracción de términos que usa la Wikipedia como fuente de información semántica. El sistema ha sido probado en un corpus médico en español. Comparamos los resultados usando un módulo de un extractor de términos híbrido y un módulo equivalente que utiliza la Wikipedia. Los resultados demuestran que este recurso puede utilizarse para esta tarea.We present a term extractor that uses Wikipedia as an semantic information source. The system has been tested on a Spanish medical corpus. We compare the results obtained using a module of a hybrid term extractor and an equivalent module that use the Wikipedia. The results show that this resource may be used for this task
    corecore